Crowdsourcing, in which human intelligence and productivity is dynamically mobilized to tackle tasks too complex for automation alone to handle, has grown to be an important research topic and inspired new businesses (e.g., Uber, Airbnb). Over the years, crowdsourcing has morphed from providing a platform where workers and tasks can be matched up manually into one which leverages data-driven algorithmic management approaches powered by artificial intelligence (AI) to achieve increasingly sophisticated optimization objectives. In this paper, we provide a survey presenting a unique systematic overview on how AI can empower crowdsourcing - which we refer to as AI-Empowered Crowdsourcing(AIEC). We propose a taxonomy which divides algorithmic crowdsourcing into three major areas: 1) task delegation, 2) motivating workers, and 3) quality control, focusing on the major objectives which need to be accomplished. We discuss the limitations and insights, and curate the challenges of doing research in each of these areas to highlight promising future research directions.
translated by 谷歌翻译
Federated learning has recently been applied to recommendation systems to protect user privacy. In federated learning settings, recommendation systems can train recommendation models only collecting the intermediate parameters instead of the real user data, which greatly enhances the user privacy. Beside, federated recommendation systems enable to collaborate with other data platforms to improve recommended model performance while meeting the regulation and privacy constraints. However, federated recommendation systems faces many new challenges such as privacy, security, heterogeneity and communication costs. While significant research has been conducted in these areas, gaps in the surveying literature still exist. In this survey, we-(1) summarize some common privacy mechanisms used in federated recommendation systems and discuss the advantages and limitations of each mechanism; (2) review some robust aggregation strategies and several novel attacks against security; (3) summarize some approaches to address heterogeneity and communication costs problems; (4)introduce some open source platforms that can be used to build federated recommendation systems; (5) present some prospective research directions in the future. This survey can guide researchers and practitioners understand the research progress in these areas.
translated by 谷歌翻译
Learning semantic-rich representations from raw unlabeled time series data is critical for downstream tasks such as classification and forecasting. Contrastive learning has recently shown its promising representation learning capability in the absence of expert annotations. However, existing contrastive approaches generally treat each instance independently, which leads to false negative pairs that share the same semantics. To tackle this problem, we propose MHCCL, a Masked Hierarchical Cluster-wise Contrastive Learning model, which exploits semantic information obtained from the hierarchical structure consisting of multiple latent partitions for multivariate time series. Motivated by the observation that fine-grained clustering preserves higher purity while coarse-grained one reflects higher-level semantics, we propose a novel downward masking strategy to filter out fake negatives and supplement positives by incorporating the multi-granularity information from the clustering hierarchy. In addition, a novel upward masking strategy is designed in MHCCL to remove outliers of clusters at each partition to refine prototypes, which helps speed up the hierarchical clustering process and improves the clustering quality. We conduct experimental evaluations on seven widely-used multivariate time series datasets. The results demonstrate the superiority of MHCCL over the state-of-the-art approaches for unsupervised time series representation learning.
translated by 谷歌翻译
Question Generation (QG), as a challenging Natural Language Processing task, aims at generating questions based on given answers and context. Existing QG methods mainly focus on building or training models for specific QG datasets. These works are subject to two major limitations: (1) They are dedicated to specific QG formats (e.g., answer-extraction or multi-choice QG), therefore, if we want to address a new format of QG, a re-design of the QG model is required. (2) Optimal performance is only achieved on the dataset they were just trained on. As a result, we have to train and keep various QG models for different QG datasets, which is resource-intensive and ungeneralizable. To solve the problems, we propose a model named Unified-QG based on lifelong learning techniques, which can continually learn QG tasks across different datasets and formats. Specifically, we first build a format-convert encoding to transform different kinds of QG formats into a unified representation. Then, a method named \emph{STRIDER} (\emph{S}imilari\emph{T}y \emph{R}egular\emph{I}zed \emph{D}ifficult \emph{E}xample \emph{R}eplay) is built to alleviate catastrophic forgetting in continual QG learning. Extensive experiments were conducted on $8$ QG datasets across $4$ QG formats (answer-extraction, answer-abstraction, multi-choice, and boolean QG) to demonstrate the effectiveness of our approach. Experimental results demonstrate that our Unified-QG can effectively and continually adapt to QG tasks when datasets and formats vary. In addition, we verify the ability of a single trained Unified-QG model in improving $8$ Question Answering (QA) systems' performance through generating synthetic QA data.
translated by 谷歌翻译
跨模态散列(CMH)是跨模型近似最近邻搜索中最有前途的方法之一。大多数CMH解决方案理想地假设培训和测试集的标签是相同的。但是,通常违反假设,导致零拍摄的CMH问题。最近解决此问题的努力侧重于使用标签属性将知识转移到未见的类。但是,该属性与多模态数据的特征隔离。为了减少信息差距,我们介绍了一种名为LAEH的方法(嵌入零拍跨模型散列的标签属性)。 Laeh首先通过Word2Vec模型获取标签的初始语义属性向量,然后使用转换网络将它们转换为常见的子空间。接下来,它利用散列向量和特征相似矩阵来指导不同方式的特征提取网络。与此同时,Laeh使用属性相似性作为标签相似度的补充,以纠正标签嵌入和常见子空间。实验表明,Laeh优于相关代表零射和跨模态散列方法。
translated by 谷歌翻译
我们筹集并定义了一个新的众群情景,开放套装,在那里我们只知道一个不熟悉的众群项目的一般主题,我们不知道其标签空间,即可能的标签集。这仍然是一个任务注释问题,但与任务和标签空间的不熟悉妨碍了任务和工人的建模,以及真理推断。我们提出了一个直观的解决方案,Oscrowd。首先,Oscrowd将人群主题相关的数据集集成到一个大源域中,以便于部分传输学习,以近似这些任务的标签空间推理。接下来,它将基于类别相关性为每个源域分配权重。在此之后,它使用多源打开集传输学习来模拟人群任务并分配可能的注释。转让学习给出的标签空间和注释将用于指导和标准化人群工人的注释。我们在在线场景中验证了Oscrowd,并证明了Oscrowd解决了开放式众群问题,比相关的众包解决方案更好。
translated by 谷歌翻译
由于互联网工作人员的不可靠性,很难满足众群项目,特别是当任务多次并且预算有限时。最近,元学习为少量学习带来了新的生命力,使得可以使用几个训练样本获得具有公平性能的分类器。在这里,我们介绍了由Meta学习训练的机器注释员的概念,用于适合AI的任务类型(即图像分类)。与常规人群工人不同,元工人可以是可靠的,稳定的,更重要的,不知疲倦和自由。我们首先群集未标记的数据,并要求人群工人反复注释集群中心附近的情况;然后,我们利用带注释的数据和元训练数据集来建立使用不同的元学习算法来构建一组元工人。随后,要求元工人注释剩余的众群任务。 Jensen-Shannon分歧用于衡量Meta-Workers提供的注释中的分歧,这决定了人群工人是否应被邀请进一步注释同一任务。最后,我们模拟了Meta-Workers的偏好并计算了加权多数投票的共识注释。我们的实证研究证实,通过组合机器和人类智能,我们可以完成比最先进的任务分配方法的预算较低的众群项目,同时实现了优越或相当的质量。
translated by 谷歌翻译
关于重大抑郁障碍(MDD)的增加,许多研究人员关注他们的认可和治疗。现有的MDD识别算法始终使用单个时频域方法方法,但单个时频域方法太简单,不利于模拟大脑功能之间的复杂链路关系。为了解决这个问题,本文提出了一种基于多层脑功能连通网络(MBFCN)的识别方法,用于重大抑郁症,并进行认知分析。基于所提出的MBFCN的认知分析发现,alpha-beta1频带是用于识别MDD的关键子带。右前瓣叶和极度抑制紊乱(EDD)的颞叶之间的连接在基于相位滞后指数(PLI)的脑功能连接网络(BFCN)中缺乏。此外,可以找到通过抑郁特征和PHQ-9的重要性分析的潜在生物标志物。
translated by 谷歌翻译
嘈杂的标签通常在现实世界数据中找到,这导致深神经网络的性能下降。手动清洁数据是劳动密集型和耗时的。以前的研究主要侧重于加强对嘈杂标签的分类模型,而对嘈杂标签的深度度量学习(DML)的鲁棒性仍然较少。在本文中,通过提出与DML的内存(棱镜)方法提出基于概率排名的实例选择来弥合这一重要差异。棱镜计算清洁标签的概率,并滤除潜在的噪声样本。具体地,我们提出了一种新方法,即Von Mises-Fisher分配相似性(VMF-SIM),通过估计每个数据类的VON MISES-FISHER(VMF)分布来计算这种概率。与现有的平均相似性方法(AVGSIM)相比,除了平均相似度之外,VMF-SIM还考虑每个类的方差。通过这种设计,所提出的方法可以应对挑战的DML情况,其中大多数样本是嘈杂的。在合成和现实世界嘈杂的数据集中的广泛实验表明,拟议的方法在合理的培训时间内实现了高达@ 1的精度高达8.37%的精度@ 1。
translated by 谷歌翻译
对敌对训练(AT)作为最小值优化问题,可以有效地增强模型对对抗攻击的鲁棒性。现有的AT方法主要集中于操纵内部最大化,以生成质量对抗性变体或操纵外部最小化以设计有效的学习目标。然而,始终表现出与准确性和跨界混合物问题存在的鲁棒性的经验结果,这激发了我们研究某些标签随机性以使AT受益。首先,我们分别对AT的内部最大化和外部最小化进行彻底研究嘈杂的标签(NLS)注射,并获得有关NL注射益处AT何时的观察结果。其次,根据观察结果,我们提出了一种简单但有效的方法 - Noilin将NLS随机注入每个训练时期的训练数据,并在发生强大的过度拟合后动态提高NL注入率。从经验上讲,Noilin可以显着减轻AT的不良过度拟合的不良问题,甚至进一步改善了最新方法的概括。从哲学上讲,Noilin阐明了与NLS学习的新观点:NLS不应总是被视为有害的,即使在培训集中没有NLS的情况下,我们也可以考虑故意注射它们。代码可在https://github.com/zjfheart/noilin中找到。
translated by 谷歌翻译